Thompson sampling について

Words near each other

・ Thompson Pond (Massachusetts)
・ Thompson Pump and Manufacturing
・ Thompson railway station
・ Thompson Ranch
・ Thompson Recreation and Athletic Centre
・ Thompson Ridge
・ Thompson Ridge, New York
・ Thompson River
・ Thompson River (Missouri)
・ Thompson River (Montana)
・ Thompson Rivers University
・ Thompson Rivers University Faculty of Law
・ Thompson Rivers University, Open Learning
・ Thompson Rivers WolfPack
・ Thompson Samkange
・ Thompson sampling
・ Thompson School
・ Thompson School (Webster, Massachusetts)
・ Thompson School District R2-J
・ Thompson shell
・ Thompson Site
・ Thompson Sound
・ Thompson Sound, British Columbia
・ Thompson Speedway Motorsports Park
・ Thompson sporadic group
・ Thompson Springs, Utah
・ Thompson Spur
・ Thompson Square
・ Thompson Square (album)
・ Thompson Square (MBTA station)

Dictionary Lists

mini英和辞書

翻訳と辞書　辞書検索 [ 開発暫定版 ]

スポンサードリンク

Thompson sampling ：ウィキペディア英語版

Thompson sampling

In artificial intelligence, Thompson sampling,〔 named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists in choosing the action that maximizes the expected reward with respect to a randomly drawn belief.
== Description ==

Consider a set of contexts

\mathcal

, a set of actions

\mathcal

, and rewards in

\mathbb

. In each round, the player obtains a context

x \in \mathcal

, plays an action

a \in \mathcal

and receives a reward

r \in \mathbb

following a distribution that depends on the context and the issued action. The aim of the player is to play actions such as to maximize the cumulative rewards.
The elements of Thompson sampling are as follows:
# a likelihood function

P(r|\theta,a,x)

;
# a set

\Theta

of parameters

\theta

of the distribution of r;
# a prior distribution

P(\theta)

on these parameters;
# past observations triplets

\mathcal = \

;
# a posterior distribution

P(\theta|\mathcal) \propto P(\mathcal|\theta)P(\theta)

, where

P(\mathcal|\theta)

is the likelihood function.
Thompson sampling consists in playing the action

a^\ast \in \mathcal

according to the probability that it maximizes the expected reward, i.e.
:

) P(\theta|\mathcal) \, d\theta,

where

\mathbb

is the indicator function.
In practice, the rule is implemented by sampling, in each round, a parameter

\theta^\ast

from the posterior

P(\theta|\mathcal)

, and choosing the action

a^\ast

that maximizes

\mathbb()

, i.e. the expected reward given the parameter, the action and the current context. Conceptually, this means that the player instantiates his beliefs randomly in each round, and then he acts optimally according to them.

抄文引用元・出典: フリー百科事典『ウィキペディア（Wikipedia）』
■ウィキペディアで「Thompson sampling」の詳細全文を読む

スポンサードリンク

翻訳と辞書 : 翻訳のためのインターネットリソース